Background

In this project, I analyze the performance of the Milwaukee Bucks, a prominent National Basketball Association (NBA) team. We investigate the regular season data and identify the team performance indicators most significant to winning.


Source Data

Analysis


Imports & Libraries

# install.packages("tidyverse")
library(tidyverse)

# install.packages("rvest")
library(rvest)

# install.packages("janitor")
library(janitor)

# install.packages("plotly")
library(plotly)

Function to Read Tables

# Leverage Web Scarping to Create Data tables
readTables <- function(list) {
  tables <- c()
  for (i in list) {
      temp <- i %>%
      read_html() %>%
      html_nodes(css = "table") %>%
      html_table(fill = TRUE)
      tables <- c(tables, temp)
  }
  return(tables)
}

Save Tables

# Read Tables 
data <- readTables(list(prime, prime2, prime3, sec, sec2, sec3)) 

# Save Game Data for each Year
game_data1 <- data[[1]]
game_data2 <- data[[2]]
game_data3 <- data[[3]]

# Save Rank Data for each Year
rank_data1 <- data[[4]]
rank_data2 <- data[[5]]
rank_data3 <- data[[6]]

Technical Challenge: Because our tables are web scarped, each contain multiple syntax and format errors (unnecessary and repetitive columns and rows). As such, wrangling the tables into a unified syntax will be vital. Unfortunately, many inconsistencies make cleaning them all with a single function unrealistic.


Game Data Wrangling and Cleaning

# Game data for year 2022-2023
game_data1 <- game_data1[-5,] # drop first 5 rows
game_data1 <- game_data1[,-c(4,1)] # drop columns 1 and 4
game_data1 <- game_data1 %>% row_to_names(row_number = 1) # reset cols
game_data1 <- game_data1[-c(20,21,42,43,64,65,86,87),] # drop redundant rows
game_data1 <- game_data1[,-c(21,22,23,24,25,26,27,28,29,30)] # opponent stats
game_data1 <- game_data1[,-c(21,22,23,24,25,26,27)] # redundant columns
colnames(game_data1)[6] = "opp points"
colnames(game_data1)[4] = "Outcome"
game_data1$Date <- as.POSIXct(game_data1$Date, format="%Y")
game_data1$Year <- year(game_data1$Date) # create new year column

# Game data for year 2021-2022
game_data2 <- game_data2[-5,] 
game_data2 <- game_data2[,-c(4,1)] 
game_data2 <- game_data2 %>% row_to_names(row_number = 1) 
game_data2 <- game_data2[-c(20,21,42,43,64,65,86,87),] 
game_data2 <- game_data2[,-c(21,22,23,24,25,26,27,28,29,30)] 
game_data2 <- game_data2[,-c(21,22,23,24,25,26,27)]
colnames(game_data2)[6] = "opp points"
colnames(game_data2)[4] = "Outcome"
game_data2$Date <- as.POSIXct(game_data2$Date, format="%Y")
game_data2$Year <- year(game_data2$Date) 

# Game data for year 2020-2021
game_data3 <- game_data3[,-c(4,1)] 
game_data3 <- game_data3 %>% row_to_names(row_number = 1) 
colnames(game_data3)[6] = "opp points"
colnames(game_data3)[4] = "Outcome"
game_data3 <- game_data3[,-c(21,22,23,24,25,26,27,28,29,30)] 
game_data3 <- game_data3[-c(20,21,43,44,65,66),] 
game_data3 <- game_data3[,-c(21,22,23,24,25,26,27)] 
game_data3$Date <- as.POSIXct(game_data3$Date, format="%Y-%m-%d")
game_data3$Year <- year(game_data3$Date) 

Rank Data Wrangling and Cleaning

rank_data1 <- rank_data1[,-c(3,4,5,6,7)]
colnames(rank_data1)[2] = "Rank"
rank_data1 <- rank_data1 %>%
  mutate(ABV = c("BOS", "CLE","PHI", "MIL","DEN","MEM","NYK","PHO","GSW","TOR","SAC","NOP","BRK","CHI","LAC", "LAL","ATL","MIA","DAL","MIN","OKC","WAS","UTA","ORL","POR","IND","CHO","DET","HOU","SAS"))

rank_data2 <- rank_data2[,-c(3,4,5,6,7)]
colnames(rank_data2)[2] = "Rank"
rank_data2 <- rank_data2 %>%
  mutate(ABV = c("BOS","PHO","GSW","UTA","MEM","MIA", "MIL","DAL","PHI","DEN","MIN","BRK","TOR","ATL","CLE","CHO","LAC", "NYK", "NOP","CHI","SAS","LAL","IND","WAS","SAC","POR","DET","OKC","ORL","HOU"))

rank_data3 <- rank_data3[,-c(3,4,5,6,7)]
colnames(rank_data3)[2] = "Rank"
rank_data3 <- rank_data3 %>%
  mutate(ABV = c("UTA","LAC","MIL","PHO","PHI","BRK","DEN","LAL","DAL","ATL","POR","BOS","GSW","MEM","NYK","MIA","IND","TOR","NOP","CHI","SAS","WAS","CHO","SAC","DET","MIN","HOU","ORL","CLE","OKC"))

Our game and rank tables are now fully processed. Joining them using the left_join function will ensure that each case has the correct information from both datasets. Furthermore, combining the joined tables with the rbind function will produce a final table with a nearly 300-game sample size to investigate.


Combining Game Data and Rank Data

# Join Game and Ranks 
game_data1 <- game_data1 %>%
  left_join(rank_data1, by = c("Opp"="ABV"))
game_data2 <- game_data2 %>%
  left_join(rank_data2, by = c("Opp"="ABV"))
game_data3 <- game_data3 %>%
  left_join(rank_data3, by = c("Opp"="ABV"))

game_data3 <- game_data3[-20,] 

# Final Game Data Table 
games <- rbind(game_data3, game_data2, game_data1)
games <- games[-20,]

# Convert Chr to Numeric 
games[, c("G", "Tm","opp points","FG","FGA","FG%","3P","3PA","3P%","FT","FTA","FT%","ORB","TRB","AST","STL","BLK","TOV","PF","Rank")] <- lapply(games[, c("G", "Tm","opp points","FG","FGA","FG%","3P","3PA","3P%","FT","FTA","FT%","ORB","TRB","AST","STL","BLK","TOV","PF","Rank")], as.numeric)

# Inspecting Final Table 
head(games)
# summary(games)

With the help of fundamental functions like head and summary, we can see that the final “games” table is formatted correctly and ready for further analysis.

The three major categories explored in this project surround the most impactful elements of a game: Offense, Defense, and Shooting. We investigated each of these areas below.


Effect of Assist-Turnover Ratio on Outcomes (Offense)

game1 <- games %>%
  group_by(Outcome, Year, Rank) %>%
  summarise(TeamRank = paste(Team, "Rank -", Rank, sep = " "), AST = AST, TOV = TOV, AT = AST/TOV)

game_1 <- game1 %>%
  mutate(Outcome = ifelse(Outcome == "W", 1,0))

# Null Hypothesis: The value of the Assist-Turnover ratio has minimal impact game outcome

Off_corr <- cor(game_1$AT,game_1$Outcome, method = "pearson")  
Off_model <- wilcox.test(game1$AT ~ game1$Outcome, conf.int =T)

# Mean of Assist-Turnover ratio for Wins
off_wins <- game1 %>%
  select(Outcome, Year,Rank, AT) %>%
  filter(Outcome == "W")
offW_mean <- mean(off_wins$AT) 
  
# Mean of Assist-Turnover ratio for Losses
off_losses <- game1 %>%
  select(Outcome, Year, Rank, AT) %>%
  filter(Outcome == "L")
offL_mean <- mean(off_losses$AT)

print(Off_corr)
print(Off_model)
print(offW_mean)
print(offL_mean)

Let’s define the variables we focused on above: An Assist is a pass made to a teammate which leads directly to a score. A Turnover is when a player commits an error and allows the opposing team to gain possession of the ball.

Utilizing the Mann-Whitney U test, we explore the Assist-Turnover ratio’s impact on the game outcome above. The test shows a significant difference in means between games won and lost (P-Value : 0.0016). If we further interpret the calculated means, we see that more assists, in turn a higher ratio, significantly impact winning.


Supporting Visualization

game1_filtered <- filter(game1, Year != 2020)

p <- ggplot(game1_filtered, aes(x = AST, y = TOV, color = Outcome, text = TeamRank)) +
  geom_point() +
  labs(x = "Assists", y = "Turnovers", color = "Game Outcome") +
  theme_minimal() +
  facet_wrap(. ~ Year) +
  ggtitle("Assists vs. Turnovers by Game Outcome and Year") +
  theme(panel.spacing.x = unit(5, "mm"))

ggplotly(p, tooltip = c("text"), width = 900, height = 300)

The spread of the points indicates that there is little correlation. This result is consistent with the correlation we found earlier (0.144). However, despite a lack of correlation, we can observe that most games with a high Assist-Turnover ratio are wins. As such, The Buck’s coaching staff should emphasize smart and controlled ball movement to minimize turnovers and maximize assists.


Effect of Defensive Possessions on Outcomes (Defense)

game2 <- games %>%
  group_by(Outcome, Year, Rank) %>%
  summarise(TeamRank = paste(Team, "Rank -", Rank, sep = " "), PF = PF,  DRB = TRB - ORB, DES = (DRB + BLK + STL - PF), Def_ratio = PF/ DES)

game_2 <- game2 %>%
  mutate(Outcome = ifelse(Outcome == "W", 1,0))

# Null Hypothesis: The value of the Defensive ratio has minimal impact game outcome

def_corr <- cor(game_2$Def_ratio,game_2$Outcome, method = "pearson")  
def_model <- wilcox.test(game2$Def_ratio ~ game2$Outcome, conf.int =T)

# Mean of Defensive ratio for Wins
def_wins <- game2 %>%
  select(Outcome, Year, Rank, Def_ratio) %>%
  filter(Outcome == "W")
defW_mean <- mean(def_wins$Def_ratio) 
  
# Mean of Defensive ratio for Losses
def_losses <- game2 %>%
  select(Outcome, Year, Rank, Def_ratio) %>%
  filter(Outcome == "L")
defL_mean <- mean(def_losses$Def_ratio)

print(def_corr) # Correlation
print(def_model) # U Test
print(defW_mean) # Mean of Defensive ratio for Wins
print(defL_mean) # Mean of Defensive ratio for Losses

Let’s define the variables we focused on here: A Personal Foul (PF) is any illegal contact with an offensive player, including pushing and striking. The Defensive Efficiency Score (DES) is the difference between good and bad defensive actions.

Now we explored Defensive Possession’s impact on the game outcome above. The Mann-Whitney U tests show a significant difference in means between games won and lost (P-Value : 0.0003). If we further interpret the calculated means, we see that fewer personal fouls significantly impact winning.


Supporting Visualization

game2_filtered <- filter(game2, Year != 2020)

p2 <- ggplot(game2_filtered, aes(x = DES, y = PF , color = Outcome, text =TeamRank)) +
  geom_point() +
  labs(x = "Defensive Efficiency Score", y = "Personal Fouls Commited", color = "Game Outcome") +
  theme_minimal() +
  facet_wrap(. ~ Year) +
  ggtitle("Efficiency of Defensive Possessions") +
  theme(panel.spacing.x = unit(5, "mm"))

ggplotly(p2, tooltip = c("text"), width = 900, height = 300)
NA

In the graphs above, we can see a moderate negative correlation between our variables. This result is consistent with the correlation we found earlier (-0.277). This relationship suggests that personal fouls decrease as DES increases. Combining correlation and difference of means we can advise the Buck’s coaching staff to emphasize strategic positioning and team defense to win the most games.


Effect of Thee Point Field Goal Attempts on Outcomes (Shooting)

game3 <- games %>%
  group_by(Outcome, Year, Rank) %>%
  summarise(TeamRank = paste(Team, "Rank -", Rank, sep = " "), `3PA` = `3PA`, `3P%` = `3P%`, Percent_attempt = (`3P%`/`3PA`))

game_3 <- game3 %>%
  mutate(Outcome = ifelse(Outcome == "W", 1,0))

# Null Hypothesis: The value of the Percent-Attempt ratio has minimal impact game outcome

shot_corr <- cor(game_3$Percent_attempt,game_3$`3PA`, method = "pearson")  
shot_model <- wilcox.test(game3$Percent_attempt ~ game3$Outcome, conf.int =T)

# Mean of Percent-Attempt ratio for Wins
shot_wins <- game3 %>%
  select(Outcome, Year, Rank, Percent_attempt) %>%
  filter(Outcome == "W")
shotW_mean <- mean(shot_wins$Percent_attempt) 
  
# Mean of Percent-Attempt ratio for Losses
shot_losses <- game3 %>%
  select(Outcome, Year, Rank, Percent_attempt) %>%
  filter(Outcome == "L")
shotL_mean <- mean(shot_losses$Percent_attempt)

print(shot_corr) # Correlation
print(shot_model) # U Test
print(shotW_mean) # Mean of Percent-Attempt ratio for Wins
print(shotL_mean) # Mean of Percent-Attempt ratio for Losses

Let’s define the variables we focused on above: A three-point attempt is a shot released from any position behind the three-point arc. The percent-attempt ratio takes both shooting accuracy and the volume into account to measure shooting efficency.

Here, we explored the percent-attempt ratio’s impact on the game outcome above. The Mann-Whitney U tests, once again, show a significant difference in means between games won and lost (P-Value : 0.0003). If we further interpret the calculated means, we see that fewer personal fouls significantly impact winning.


Supporting Visualization

game3_filtered <- filter(game3, Year != 2020)

p3 <- ggplot(game3_filtered, aes(x = Percent_attempt, y = `3PA` , color = Outcome, text =TeamRank)) +
  geom_point()+
  labs(x = "Shooting Accuracy Score", y = "3 Point Field Goal Attempts", color = "Game Outcome") +
  theme_minimal() +
  facet_wrap(. ~ Year) +
  ggtitle("Impact of 3 Point Field Goal Attempts") +
  theme(panel.spacing.x = unit(5, "mm"))

ggplotly(p3, tooltip = c("text"), width = 900, height = 300)

Finally, we can see the strongest negative correlation between our variables (-0.592). This relationship suggests that as three-point attempts decrease, the percent-attempt ratio increases. Combining correlation and the U test we can advise Buck’s coaching staff to strategize and create open shoots instead of frequent ones.


Conclusion

After examining the three most impactful elements (Offense, Defense, and Shooting), we provided the Milwaukee Bucks coaching staff with meaningful insights backed by reliable statistical tests. Now, it is up to their organization to implement them appropriately as they enter the playoffs.


---
title: "Milwaukee Bucks Analysis"
output: html_notebook
---
*****
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
### __Background__

### In this project, I analyze the performance of the Milwaukee Bucks, a prominent National Basketball Association (NBA) team. We investigate the regular season data and identify the team performance indicators most significant to winning. 
*****

### __Source Data__

#### [Primary Data](https://www.basketball-reference.com/teams/MIL/2023/gamelog/)

#### [Secondary Data](https://www.teamrankings.com/nba/rankings/teams/?data=2023-04-22)

#### [Additional Information](https://github.com/Mohamed-Elmz/Milwaukee-Bucks-Analysis)

*****
### __Analysis__
*****
#### Imports & Libraries
```{r}
# install.packages("tidyverse")
library(tidyverse)

# install.packages("rvest")
library(rvest)

# install.packages("janitor")
library(janitor)

# install.packages("plotly")
library(plotly)
```
#### Webpage Links
```{r}
# Primary Data Source from 2020-2023
prime <- "https://www.basketball-reference.com/teams/MIL/2023/gamelog/"
prime2 <- "https://www.basketball-reference.com/teams/MIL/2022/gamelog/"
prime3 <- "https://www.basketball-reference.com/teams/MIL/2021/gamelog/"

# Secondary Data Source from 2020-2023
sec <- "https://www.teamrankings.com/nba/rankings/teams/?date=2023-04-22"
sec2 <- "https://www.teamrankings.com/nba/rankings/teams/?date=2022-06-17"
sec3 <- "https://www.teamrankings.com/nba/rankings/teams/?date=2021-07-21"
```
#### Function to Read Tables 
```{r}
# Leverage Web Scarping to Create Data tables
readTables <- function(list) {
  tables <- c()
  for (i in list) {
      temp <- i %>%
      read_html() %>%
      html_nodes(css = "table") %>%
      html_table(fill = TRUE)
      tables <- c(tables, temp)
  }
  return(tables)
}
```
#### Save Tables
```{r}
# Read Tables 
data <- readTables(list(prime, prime2, prime3, sec, sec2, sec3)) 

# Save Game Data for each Year
game_data1 <- data[[1]]
game_data2 <- data[[2]]
game_data3 <- data[[3]]

# Save Rank Data for each Year
rank_data1 <- data[[4]]
rank_data2 <- data[[5]]
rank_data3 <- data[[6]]
```
*****
#### __Technical Challenge:__ Because our tables are web scarped, each contain multiple syntax and format errors (unnecessary and repetitive columns and rows). As such, wrangling the tables into a unified syntax will be vital. Unfortunately, many inconsistencies make cleaning them all with a single function unrealistic. 
*****
#### Game Data Wrangling and Cleaning
```{r}
# Game data for year 2022-2023
game_data1 <- game_data1[-5,] # drop first 5 rows
game_data1 <- game_data1[,-c(4,1)] # drop columns 1 and 4
game_data1 <- game_data1 %>% row_to_names(row_number = 1) # reset cols
game_data1 <- game_data1[-c(20,21,42,43,64,65,86,87),] # drop redundant rows
game_data1 <- game_data1[,-c(21,22,23,24,25,26,27,28,29,30)] # opponent stats
game_data1 <- game_data1[,-c(21,22,23,24,25,26,27)] # redundant columns
colnames(game_data1)[6] = "opp points"
colnames(game_data1)[4] = "Outcome"
game_data1$Date <- as.POSIXct(game_data1$Date, format="%Y")
game_data1$Year <- year(game_data1$Date) # create new year column

# Game data for year 2021-2022
game_data2 <- game_data2[-5,] 
game_data2 <- game_data2[,-c(4,1)] 
game_data2 <- game_data2 %>% row_to_names(row_number = 1) 
game_data2 <- game_data2[-c(20,21,42,43,64,65,86,87),] 
game_data2 <- game_data2[,-c(21,22,23,24,25,26,27,28,29,30)] 
game_data2 <- game_data2[,-c(21,22,23,24,25,26,27)]
colnames(game_data2)[6] = "opp points"
colnames(game_data2)[4] = "Outcome"
game_data2$Date <- as.POSIXct(game_data2$Date, format="%Y")
game_data2$Year <- year(game_data2$Date) 

# Game data for year 2020-2021
game_data3 <- game_data3[,-c(4,1)] 
game_data3 <- game_data3 %>% row_to_names(row_number = 1) 
colnames(game_data3)[6] = "opp points"
colnames(game_data3)[4] = "Outcome"
game_data3 <- game_data3[,-c(21,22,23,24,25,26,27,28,29,30)] 
game_data3 <- game_data3[-c(20,21,43,44,65,66),] 
game_data3 <- game_data3[,-c(21,22,23,24,25,26,27)] 
game_data3$Date <- as.POSIXct(game_data3$Date, format="%Y-%m-%d")
game_data3$Year <- year(game_data3$Date) 
```
#### Rank Data Wrangling and Cleaning 
```{r}
rank_data1 <- rank_data1[,-c(3,4,5,6,7)]
colnames(rank_data1)[2] = "Rank"
rank_data1 <- rank_data1 %>%
  mutate(ABV = c("BOS", "CLE","PHI", "MIL","DEN","MEM","NYK","PHO","GSW","TOR","SAC","NOP","BRK","CHI","LAC", "LAL","ATL","MIA","DAL","MIN","OKC","WAS","UTA","ORL","POR","IND","CHO","DET","HOU","SAS"))

rank_data2 <- rank_data2[,-c(3,4,5,6,7)]
colnames(rank_data2)[2] = "Rank"
rank_data2 <- rank_data2 %>%
  mutate(ABV = c("BOS","PHO","GSW","UTA","MEM","MIA", "MIL","DAL","PHI","DEN","MIN","BRK","TOR","ATL","CLE","CHO","LAC", "NYK", "NOP","CHI","SAS","LAL","IND","WAS","SAC","POR","DET","OKC","ORL","HOU"))

rank_data3 <- rank_data3[,-c(3,4,5,6,7)]
colnames(rank_data3)[2] = "Rank"
rank_data3 <- rank_data3 %>%
  mutate(ABV = c("UTA","LAC","MIL","PHO","PHI","BRK","DEN","LAL","DAL","ATL","POR","BOS","GSW","MEM","NYK","MIA","IND","TOR","NOP","CHI","SAS","WAS","CHO","SAC","DET","MIN","HOU","ORL","CLE","OKC"))
```
*****
#### Our game and rank tables are now fully processed. Joining them using the left_join function will ensure that each case has the correct information from both datasets. Furthermore, combining the joined tables with the rbind function will produce a final table with a nearly 300-game sample size to investigate.

*****

#### Combining Game Data and Rank Data
```{r}
# Join Game and Ranks 
game_data1 <- game_data1 %>%
  left_join(rank_data1, by = c("Opp"="ABV"))
game_data2 <- game_data2 %>%
  left_join(rank_data2, by = c("Opp"="ABV"))
game_data3 <- game_data3 %>%
  left_join(rank_data3, by = c("Opp"="ABV"))

game_data3 <- game_data3[-20,] 

# Final Game Data Table 
games <- rbind(game_data3, game_data2, game_data1)
games <- games[-20,]

# Convert Chr to Numeric 
games[, c("G", "Tm","opp points","FG","FGA","FG%","3P","3PA","3P%","FT","FTA","FT%","ORB","TRB","AST","STL","BLK","TOV","PF","Rank")] <- lapply(games[, c("G", "Tm","opp points","FG","FGA","FG%","3P","3PA","3P%","FT","FTA","FT%","ORB","TRB","AST","STL","BLK","TOV","PF","Rank")], as.numeric)

# Inspecting Final Table 
head(games)
# summary(games)
```
*****
#### With the help of fundamental functions like head and summary, we can see that the final "games" table is formatted correctly and ready for further analysis.

#### The three major categories explored in this project surround the most impactful elements of a game: Offense, Defense, and Shooting. We investigated each of these areas below.
*****

#### Effect of Assist-Turnover Ratio on Outcomes (Offense)
```{r}
game1 <- games %>%
  group_by(Outcome, Year, Rank) %>%
  summarise(TeamRank = paste(Team, "Rank -", Rank, sep = " "), AST = AST, TOV = TOV, AT = AST/TOV)

game_1 <- game1 %>%
  mutate(Outcome = ifelse(Outcome == "W", 1,0))

# Null Hypothesis: The value of the Assist-Turnover ratio has minimal impact game outcome

Off_corr <- cor(game_1$AT,game_1$Outcome, method = "pearson")  
Off_model <- wilcox.test(game1$AT ~ game1$Outcome, conf.int =T)

# Mean of Assist-Turnover ratio for Wins
off_wins <- game1 %>%
  select(Outcome, Year,Rank, AT) %>%
  filter(Outcome == "W")
offW_mean <- mean(off_wins$AT) 
  
# Mean of Assist-Turnover ratio for Losses
off_losses <- game1 %>%
  select(Outcome, Year, Rank, AT) %>%
  filter(Outcome == "L")
offL_mean <- mean(off_losses$AT)

print(Off_corr)
print(Off_model)
print(offW_mean)
print(offL_mean)
```
*****
#### Let's define the variables we focused on above: An Assist is a pass made to a teammate which leads directly to a score. A Turnover is when a player commits an error and allows the opposing team to gain possession of the ball. 

#### Utilizing the Mann-Whitney U test, we explore the Assist-Turnover ratio's impact on the game outcome above. The test shows a significant difference in means between games won and lost (P-Value : 0.0016). If we further interpret the calculated means, we see that more assists, in turn a higher ratio, significantly impact winning.

*****
#### Supporting Visualization
```{r}
game1_filtered <- filter(game1, Year != 2020)

p <- ggplot(game1_filtered, aes(x = AST, y = TOV, color = Outcome, text = TeamRank)) +
  geom_point() +
  labs(x = "Assists", y = "Turnovers", color = "Game Outcome") +
  theme_minimal() +
  facet_wrap(. ~ Year) +
  ggtitle("Assists vs. Turnovers by Game Outcome and Year") +
  theme(panel.spacing.x = unit(5, "mm"))

ggplotly(p, tooltip = c("text"), width = 900, height = 300)
```

*****
#### The spread of the points indicates that there is little correlation. This result is consistent with the correlation we found earlier (0.144). However, despite a lack of correlation, we can observe that most games with a high Assist-Turnover ratio are wins. As such, The Buck's coaching staff should emphasize smart and controlled ball movement to minimize turnovers and maximize assists.

*****
#### Effect of Defensive Possessions on Outcomes (Defense)
```{r}
game2 <- games %>%
  group_by(Outcome, Year, Rank) %>%
  summarise(TeamRank = paste(Team, "Rank -", Rank, sep = " "), PF = PF,  DRB = TRB - ORB, DES = (DRB + BLK + STL - PF), Def_ratio = PF/ DES)

game_2 <- game2 %>%
  mutate(Outcome = ifelse(Outcome == "W", 1,0))

# Null Hypothesis: The value of the Defensive ratio has minimal impact game outcome

def_corr <- cor(game_2$Def_ratio,game_2$Outcome, method = "pearson")  
def_model <- wilcox.test(game2$Def_ratio ~ game2$Outcome, conf.int =T)

# Mean of Defensive ratio for Wins
def_wins <- game2 %>%
  select(Outcome, Year, Rank, Def_ratio) %>%
  filter(Outcome == "W")
defW_mean <- mean(def_wins$Def_ratio) 
  
# Mean of Defensive ratio for Losses
def_losses <- game2 %>%
  select(Outcome, Year, Rank, Def_ratio) %>%
  filter(Outcome == "L")
defL_mean <- mean(def_losses$Def_ratio)

print(def_corr) # Correlation
print(def_model) # U Test
print(defW_mean) # Mean of Defensive ratio for Wins
print(defL_mean) # Mean of Defensive ratio for Losses

```
*****
#### Let's define the variables we focused on here: A Personal Foul (PF) is any illegal contact with an offensive player, including pushing and striking. The Defensive Efficiency Score (DES) is the difference between good and bad defensive actions.

#### Now we explored Defensive Possession's impact on the game outcome above. The Mann-Whitney U tests show a significant difference in means between games won and lost (P-Value : 0.0003). If we further interpret the calculated means, we see that fewer personal fouls significantly impact winning.
*****

#### Supporting Visualization
```{r}
game2_filtered <- filter(game2, Year != 2020)

p2 <- ggplot(game2_filtered, aes(x = DES, y = PF , color = Outcome, text =TeamRank)) +
  geom_point() +
  labs(x = "Defensive Efficiency Score", y = "Personal Fouls Commited", color = "Game Outcome") +
  theme_minimal() +
  facet_wrap(. ~ Year) +
  ggtitle("Efficiency of Defensive Possessions") +
  theme(panel.spacing.x = unit(5, "mm"))

ggplotly(p2, tooltip = c("text"), width = 900, height = 300)

```
*****

#### In the graphs above, we can see a moderate negative correlation between our variables. This result is consistent with the correlation we found earlier (-0.277). This relationship suggests that personal fouls decrease as DES increases. Combining correlation and difference of means we can advise the Buck's coaching staff to emphasize strategic positioning and team defense to win the most games.

*****
#### Effect of Thee Point Field Goal Attempts on Outcomes (Shooting)
```{r}
game3 <- games %>%
  group_by(Outcome, Year, Rank) %>%
  summarise(TeamRank = paste(Team, "Rank -", Rank, sep = " "), `3PA` = `3PA`, `3P%` = `3P%`, Percent_attempt = (`3P%`/`3PA`))

game_3 <- game3 %>%
  mutate(Outcome = ifelse(Outcome == "W", 1,0))

# Null Hypothesis: The value of the Percent-Attempt ratio has minimal impact game outcome

shot_corr <- cor(game_3$Percent_attempt,game_3$`3PA`, method = "pearson")  
shot_model <- wilcox.test(game3$Percent_attempt ~ game3$Outcome, conf.int =T)

# Mean of Percent-Attempt ratio for Wins
shot_wins <- game3 %>%
  select(Outcome, Year, Rank, Percent_attempt) %>%
  filter(Outcome == "W")
shotW_mean <- mean(shot_wins$Percent_attempt) 
  
# Mean of Percent-Attempt ratio for Losses
shot_losses <- game3 %>%
  select(Outcome, Year, Rank, Percent_attempt) %>%
  filter(Outcome == "L")
shotL_mean <- mean(shot_losses$Percent_attempt)

print(shot_corr) # Correlation
print(shot_model) # U Test
print(shotW_mean) # Mean of Percent-Attempt ratio for Wins
print(shotL_mean) # Mean of Percent-Attempt ratio for Losses
```
*****

#### Let's define the variables we focused on above: A three-point attempt is a shot released from any position behind the three-point arc. The percent-attempt ratio takes both shooting accuracy and the volume into account to measure shooting efficency.

#### Here, we explored the percent-attempt ratio's impact on the game outcome above. The Mann-Whitney U tests, once again, show a significant difference in means between games won and lost (P-Value : 0.0003). If we further interpret the calculated means, we see that fewer personal fouls significantly impact winning.

*****
#### Supporting Visualization
```{r}
game3_filtered <- filter(game3, Year != 2020)

p3 <- ggplot(game3_filtered, aes(x = Percent_attempt, y = `3PA` , color = Outcome, text =TeamRank)) +
  geom_point()+
  labs(x = "Shooting Accuracy Score", y = "3 Point Field Goal Attempts", color = "Game Outcome") +
  theme_minimal() +
  facet_wrap(. ~ Year) +
  ggtitle("Impact of 3 Point Field Goal Attempts") +
  theme(panel.spacing.x = unit(5, "mm"))

ggplotly(p3, tooltip = c("text"), width = 900, height = 300)
```
*****
#### Finally, we can see the strongest negative correlation between our variables (-0.592). This relationship suggests that as three-point attempts decrease, the percent-attempt ratio increases. Combining correlation and the U test we can advise Buck's coaching staff to strategize and create open shoots instead of frequent ones.

*****
### Conclusion

#### After examining the three most impactful elements (Offense, Defense, and Shooting), we provided the Milwaukee Bucks coaching staff with meaningful insights backed by reliable statistical tests. Now, it is up to their organization to implement them appropriately as they enter the playoffs.

*****